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ABSTRACT 

The proliferation of online social networks in the last decade 
has not stopped short of pets, and many different online 
platforms now exist catering to owners of various pets such 
as cats and dogs. These online pet social networks provide 
a unique opportunity to study an online social network in 
which a single user manages multiple user profiles, i.e. one 
for each pet they own. These types of multi-profile net¬ 
works allow us to investigate two questions: (1) What is 
the relationship between the pet-level and human-level net¬ 
work, and (2) what is the relationship between friendship 
links and family ties? Concretely, we study the online pet 
social networks Catster, Dogster and Hamsterster, the first 
two of which are the two largest online pet networks in ex¬ 
istence. We show how the networks on the two levels inter¬ 
act, and perform experiments to find out whether knowledge 
about friendships on a profile-level alone can be used to pre¬ 
dict which users are behind which profile. In order to do 
so, we introduce the concept of multi-profile social network, 
extend a previously defined spectral test of diagonality to 
multi-profile networks, define two new homophily measures 
for multi-profile social networks, perform a two-level social 
network analysis, and present an algorithm for predicting 
whether two profiles were created by the same user. As 
a result, we are able to predict with very high precision 
whether two profiles were created by a same user. Our work 
is thus relevant for the analysis of other online communities 
in which users may use multiple profiles. 

1. INTRODUCTION 

Pet ownership is common in many countries. In the United 
States for instance, 47% of households owned at least one 
dog, and 46% at least one cat in 2012 1 . It therefore comes 
as no surprise that specialized online social networking plat¬ 
forms exist specifically for pets. In general, online social 
networks may range from the very generic such as Facebook 
and Twitter, to the very specialized for dedicated communi¬ 
ties related to hobbies, activities or professions. Neverthe¬ 
less, the specific topic that unifies the community usually 
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Figure 1: Entities and relationships in Catster and 
Dogster. Individual owners may own both cats and 
dogs, but friendship links across the two sites are not 
possible. The sites are both unconnected to Ham¬ 
sterster. 


does not affect the basic mechanism of an online social net¬ 
work: A user creates an account to connect with other users. 
Online pet social networks are however different in this re¬ 
gard. In online pet networks, users can create any number 
of accounts, one for each pet they own. While individual 
persons cannot usually be stopped from creating multiple 
accounts in an ordinary online social network, this is usually 
frowned upon. On Wikipedia for instance, the use of multi¬ 
ple accounts by a single user is restricted to a narrow list of 
special cases which includes testing, running bots, or users 
which have been assigned official roles. Outside of these, 
the use of multiple accounts is proscribed, and when used 
for disruption is called sock puppetry [23]. For these rea¬ 
sons, information on the use of multiple accounts by users 
in online social networks is a rarely studied problem, and 
few datasets for its study exist. As an example, one study 
performs the task of predicting whether a given Wikipedia 
account is a sock puppet on less than a hundred accounts 
[18] . In contrast to this, we are able to perform a study on 
several hundreds of thousands of users in this paper. 

In online pet social networks, a single user may (and is ex¬ 
pected to) create one account for each owned pet. All social 
networking functionality such as entering personal informa¬ 
tion, creating friendship links to others, etc., are then per¬ 
formed on the pet level. Figure [l] illustrates how the multiple 
pet profiles created by a user form a family of pets. With 
their structure that allows multiple profiles per account, on- 






Table 1: Datasets analysed. 


Dataset 

9^Pets 

#Friendships 

#Households 

Pets per household 

Catster 

204,424 

5,443,885 

105,089 

1.95 

Dogster 

451,710 

8,543,549 

260,390 

1.73 

Catster + Dogster 

623,766 

13,991,746 

333,111 

1.87 

Hamsterster 

2,950 

12,531 

1,575 

1.87 


line pet social networks thus make it possible to investigate 
the following questions: 

• How does the fact that individual users own multiple 
profiles influence the structure of the social network? 

• Is it possible to predict that two accounts are managed 
by the same person? 

These questions are analysed under multiple aspects in the 
remainder of the paper. In Section [2] we review related 
work and in Section [3] we describe our three datasets. In 
Section [4] we perform social network analysis, in order to de¬ 
termine crucial differences between both networks. In Sec¬ 
tion [5] we investigate the homophily on both levels, asking 
whether the account-level network is characterized by higher 
homophily values, and if yes, for which node properties this 
is true. In Section [6] we perform a spectral analysis of the 
networks, for which we introduce an extended spectral di¬ 
agonally test in order to compare friendships with family 
ties. In Section [7] we analyse the problem of predicting that 
two profiles were set up by the same account, with the goal 
to find out whether this is possible at all, and if yes which 
structural and metadata properties are suitable for this task. 
Section [8] concludes the paper. 

2. ONLINE PET NETWORKS 

The analysis of social networks has its roots in the social sci¬ 
ences [I 7 ] . More recently, the use of social network datasets 
extracted from online social networking platforms have led 
to a large amount of research in computer science and net¬ 
work science. Online social networks allow people to connect 
via a platform in order to communicate, share content, or 
simply manage a list of connections for various purposes. In 
most such platforms, a single user account is used to man¬ 
age a single user profile, to which users can add information 
such as their age, location, sex, favorite movies, songs, food, 
or any other metadata deemed interesting to the particular 
community. In only few cases can multiple profiles be cre¬ 
ated by a single user. An example is given by company or 
product pages on Facebook, of which one user can create 
more than one. In that case however, there may be more 
than one user managing each profile, resulting in group-like 
semantics rather than profile-like semantics. In most on¬ 
line social networking platforms, the creation of multiple 
profiles by one user is not allowed, only possible by using 
multiple email addresses, or restricted to very specific users. 
On Wikipedia for instance, multiple accounts created by a 
single person are referred to as sock puppets , and are pro¬ 
scribed [23]. Therefore, few datasets are available and only 
little research has been conducted on the topic, an example 
being the detection of sock puppets on Wikipedia [l8] using 
one hundred accounts. Text mining approaches to detect 


sock puppets in Wikipedia have been described, too jl9 . 
Therefore, online pet social networks such as Catster, Dog- 
ster and Hamsterster present a unique opportunity to study 
a social network in which users manage multiple profiles. 
What is more, due to the fact that this is not proscribed 
by the sites, but instead represents the normal way of using 
them, information about identical users is openly available 
on these sites, making this study possible. 

Many specialized online networking platforms exist, and on¬ 
line pet social networking platforms specifically have been 
studied before, although social network analyses have not 
been performed on them. Related work analysing online 
animal social networks has covered Catster, Dogster and 
Hamsterster, but only used small samples of the full net¬ 
works for analysis: 2,000 dogs and 2,000 cats in [ 5 ], and 
10,000 dogs and 10,000 cats in [24]. None of these works per¬ 
forms a network analysis. The latter paper asks the question 
whether knowledge about family ties can improve prediction 
of friendship ties; the question is answered positively. 

A distinct topic is that of animal networks such as networks 
of sheep 7|, dolphins [12] and macaques 2l]. Those refer 
to social networks in which the actors are (usually wild) 
animals, whose social ties are not conditioned by humans. 
Another distinct co nce pt is that of circles, as used for in¬ 
stance in GoogleT [l3 . Although families in pet networks 
have been called circles (e.g. in [24]), they are not the same 
concept as used on Google+. On Google+, a circle is a 
device to group one’s own friends. Hence, circles do not 
provide a new type of link beyond friendships, and cannot 
be compared to the families of online pet social networks. 

3. DATASETS 

We use datasets of Catster, Dogster and Hamsterster. Since 
Dogster and Catster share user accounts, we also report 
statistics on the union of these two. An overview of the 
datasets is given in Table [l] catster. com and dogster. com 
were both founded in 2004 [§]. Both sites are linked: A 
single user can create pet profiles on both sites, and indi¬ 
vidual cat and dog profile pages are interlinked via a family 
link when they were created by the same user, hamster¬ 
ster. com is an independent site created in 2003 or 20040 
Hamsterster appears to have been shut down as of October 
20140 Other such “online social petworks” exist, such as 

x The exact creation date of Hamsterster is not known to us. 
The oldest accounts there date from 2003, but the domain 
hamsterster.com was registered in 2004 4 , and the phrase 
“after nearly ten years” written in October 2014 on Twitter * 2 
suggests a creation date of 2004. 

2 As of October 2014, the Twitter account 
©HAMSTERster™ states that Hamsterster had been 
closed “after nearly ten years”. 







Join date 

(a) Distribution of join dates (b) Population pyramid of Catster (c) Population pyramid of Dogster 

Figure 2: Demographic characteristics of the pet networks, (a) Distribution of join dates, i.e., pet profile 
creation dates. The oldest profiles have dates in 2003, on Dogster and Hamsterster. The newest accounts 
crawled by us were created in early 2012. (b-c) Population pyramids of Catster and Dogster, showing the 
distribution of ages and sexes. 


bunspace.com for rabbits, but are not studied in this pa¬ 
per. The suffix -ster in these names was likely chosen as a 
reference to friendster.com, created in 2002. We crawled 
Catster and Dogster from August 2011 to March 2012, and 
Hamsterster in February 2012. 

On all three sites, a single user can create accounts for any 
number of pets. Catster and Dogster are connected, and 
thus a single user account can be used for both sites, al¬ 
though 90.3% of accounts across Catster and Dogster include 
only cats or only dogs. The group of pet profiles created by 
a single user makes up a household or family. Friendship 
links are allowed within a single household in Dogster and 
Catster, but are not allowed in Hamsterster. All friendship 
links are undirected. 

Catster and Dogster allow only cats ( Felis catus ) and dogs 
(Canis lupus familiaris or Canis familiaris) respectively. Ham¬ 
sterster allows multiple species of hamsters (subfamily Cri- 
cetinae) and gerbils (subfamily Gerbillinae), the most com¬ 
mon species being the golden hamster ( Mesocricetus aura- 
tus ). The Hamsterster dataset contains at least one cat, a 
rat and five guinea pigs. We also found profiles in all three 
platforms apparently created for multiple pets (e.g., named 
“Hamster babies”). For each of the three sites, about two 
thirds of all users are located in the United States. 


4. MULTI-PROFILE SOCIAL NETWORK 
ANALYSIS 

The multi-profile social networks of Catster, Dogster and 
Hamsterster can be analysed using tools of social network 
analysis on two different levels: the profile level (pet level 
and the account level (family or household level). By per¬ 
forming social network analysis, we can derive several prop¬ 
erties from a multi-profile social network. First, we can de¬ 
rive the differences and similarities between the two net¬ 
works. Second, we can ask which of the two is more sim¬ 
ilar to a typical social network, in order to assess whether 


the network is better modeled as an account-level network 
to which profiles are attached, or a profile-level network in 
which the profiles are aggregated into groups. 


4.1 Definitions 

We now introduce a formal definition of a multi-profile so¬ 
cial network , of which Catster, Dogster and Hamsterster are 
examples. A multi-profile social network is a social network 
in which each person is associated with one or more profiles, 
and in which the actual social relationships as well as the 
metadata such as age, sex and location are associated to in¬ 
dividual profiles. In the online case, a multi-profile social 
network allows each user to manage one or more profiles. 
The set of profiles managed by a single account in a multi¬ 
profile social network may also be called a household or a 
family. The latter term in particular is used by the three 
studied online pet social networking sites. 


We denote a multi-profile social network by G — (V, W, E, m ), 
where V is the set of profiles, W is the set of accounts, 
E C V x V is the set of friendship edges connecting profiles, 
and m : V W is a mapping from profiles to accounts. 
Individual profiles will be denoted by the letters u, v, etc., 
while accounts will be denoted by the letters z, j, etc. As in 
other social networks, additional metadata for profiles, ac¬ 
counts and friendships may be defined. The online pet social 
networks we stu dy in clude extensive profile metadata (de¬ 
scribed in Section 5.1), but do not include account metadata, 


because they present everything from the point of view of 
the pet. The graph G p = (U, E) then represents the profile- 
level social network, while G a = (W : m(E)) represents the 
account-level social network, using the definition 


m(E) = {{hj} I * ¥= 3 A3 {u,v} E E : 
m(u) — i A m(y) — j}, 


(i) 


that is, G& is the result of identifying vertices in G p that 
are in the same household, not including loops in the result. 
An overview of the differences between the two levels of net- 
























Table 2: Network statistics in the profile-level and in the account-level social network for the three sites. 


Statistic 

Profile-level network 

G P = (V, E) 

Cat Dog Ham. 

Account-level network 

G a = (W,m(E)) 

Cat Dog Ham. 

#Nodes 

204,473 

451,710 

2,952 

105,138 

260,390 

1,576 

#Edges 

5,448,197 

8,543,549 

12,534 

494,858 

2,148,179 

4,032 

Average degree 

53.29 

37.82 

8.49 

9.41 

16.50 

5.12 

Largest connected component 

72.79% 

94.42% 

60.57% 

64.98% 

98.30% 

55.46% 

Power-law exponents 

2.12 (19) 

2.15 (26) 

2.46 (20) 

2.27 ( 8 ) 

2.27 (18) 

2.14 (7) 

Gini coefficient 13 

77.10% 

75.06% 

61.06% 

72.93% 

72.36% 

63.02% 

Clustering coefficient 

1 . 10 % 

1.43% 

9.04% 

0.38% 

1 . 01 % 

13.13% 

Diameter c 

10 

11 

14 

10 

10 

8 

Mean path length c 

2.73 

3.39 

3.42 

2.62 

3.36 

3.17 


a The minimum degree d m i n at which the power law was fitted is shown in parentheses 
b Measured using the method from 


12 


Measured in the largest connected component 


works is shown in Table [2] in terms of numerical statistics. 

4.2 Demographic Characteristics 

The distribution of sexes and and ages of pets is shown in 
Figure |5](b-c). Both sexes are equally distributed in Catster 
and Dogster, and the age distribution reflects the pet’s life 
spans. On average, there are two pets to one household. 
The average number of pets per household is consistent over 
all three pet types; it is 1.95 for cats, 1.73 for dogs and 

1.87 for hamsters (see Table [T )_. _ The distribution of pets 

per household (shown in Figure 3(a)| is power law-like, with 
similar power law exponents for all three sites. The fitted 
power law exponents using the method described in ll^ Eq. 
(5)-(6)] are 3.62 for Hamsterster (p m in = 5), 3.63 for Catster 
(pmin = 6 ), 3.90 for Dogster (pmm — 4) and 3.79 for Catster 
and Dogster combined (p m in = 5). The fitted parameter 
Pmin denotes the starting point of the fit. 

The fact that the number of pets per household follows a 
power-law distribution closely is interesting. In usual social 
networks, this is explained through a process of preferential 
attachment, i.e., persons with many friends are more likely 
to make new friends. In the case of profiles, it would mean 
that accounts with many profiles are more likely to create 
new profiles. Whether this is the correct explanation cannot 
be explained by the data however. Nonetheless, the distri¬ 
butions of pets per household follow power laws much more 
closely than the number of friends per profile. 

Thus, the account-level networks have about half the num¬ 
ber of nodes as the profile-level networks. In terms of the 
number of edges (the volume of the network), the account- 
level networks are smaller by a factor of ten (Catster), four 
(Dogster) and three (Hamsterster). The lower value for 
Hamsterster can be explained by the fact that Hamsterster 
does not allow friendship edges within families, but also by 
the fact that in Hamsterster, the average number of friend¬ 
ships is lower (8.5) than in Catster (53.3) and Dogster (37.8). 

4.3 Are Pet Networks Scale-free? 

The distribution of the node degrees in a network is an im¬ 
portant characteristic of the network. Many network models 
such as the preferential attachment model [2 predict the de¬ 


gree distribution to be scale-free, i.e., the number of nodes 
with degree d to be proportional to the power d -7 for some 
constant 7 . Along with estimating 7 , we also used the Gini 
coefficient to measure the equality of the friendship distri¬ 
bution p~2] . 

The degree distributions of the profile-level netwo rks as well 
as the account-level networks are plotted in Figure [3(b)] and 
the values of the fitted power-law exponent 7 and the Gini 
coefficient are given in Table [2] The power law exponent is 
computed using a minimum degree d m i n , using the robust 
method given in 16] Eq. (5)-(6)]. 

Beyond the fact that the average degree is lower in the 
account-level networks than in the profile-level networks, we 
observe that in Catster and Dogster, the power-law expo¬ 
nent 7 is larger for the account-level network than for the 
profile-level network, while the Gini coefficient is smaller in 
the account-level network than in the profile-level network. 
Both observations are consistent with each other, as a large 
Gini coefficient and a small power-law exponent both denote 
a more equal degree distribution [l2]. This indicates that the 
account-level networks have a more equal distribution of de¬ 
grees than the profile-level network, i.e., the account-level 
networks are more regular. Both statistics are however in 
the range usual for social networks; 7 is in the range 2.1-2.5 
and the Gini coefficient is in the range 60-70%. 

5. HOMOPHILY IN PET NETWORKS 

The term homophily refers to the tendency of people con¬ 
nected through social ties to be similar to each other. More 
precisely, homophily can be measured by a network’s assor- 
tativity with respect to a given node property. A network 
then displays positive homophily (assortativity) when two 
randomly chosen connected persons are more similar than 
two randomly chosen persons without regard to connections 
[15] . Inversely, a network displays negative homophily (dis- 
sortativity) when the opposite is the case. By analysing the 
homophily in online pet social networks, we want to answer 
the following questions: 

• Which is higher, the homophily between friends, or 
within families? If the homophily between friends is 

















(b) Number of friendships 


Figure 3: Power law-like distributions in pet networks, (a) Complementary cumulative distributions of 

pets per household for the three sites, as well as for Catster and Dogster combined (as single accounts may 
create profiles on both sites), (b) Complementary cumulative degree distribution in the profile-level [p] and 
account-level [a] networks. 


higher, this would indicate that the pets are the pri¬ 
mary actors in the networks, and that families are 
merely organizational structures, but that a proper so¬ 
cial network analysis would have to consider the pet- 
level network. On the other hand, a higher homophily 
within families would indicate that the family (or house¬ 
hold) is the primary social structure in the network, 
and that a social network analysis would have to con¬ 
sider the household-level structure to accurately reflect 
the social structure. 

• Which profile properties correlate with two pets being 
friends, and with two pets being in the same house¬ 
hold? The features indicative of a shared household 
will give insight about the behavior of the users’ choice 
of pets, while the features indicative of friendship links 
will be indicative of the social networking behavior of 
users. 

In order to answer these questions, we propose two com¬ 
plementary assortativity coefficients that apply to multi- 
profile social networks, whose ratio is measure of the rel¬ 
ative strength of intra-household homophily as compared to 
across-friendship homophily. 

5.1 Methodology 

Many different node properties can be subject to homophily 
analysis, and the exact method used for measuring it de¬ 
pends on the data type considered. In the online pet social 
networks we analyse, the data that can be added to a pet’s 
profile fall into three categories: 

• Categorical variables 

— The sex of a pet (male / female). The sex is a 
mandatory held for all pets. 


— The race of a pet. For cats and dogs, the race 
corresponds to the breed. For hamsters, the race 
corresponds to one of multiple species of hamsters 
and gerbils. The race is a mandatory held for all 
pets. 

— The pet’s coloration. The coloration is manda¬ 
tory for all hamsters and optional for cats (69% 
of prohles include it). It is not used on Dogster. 

• Numerical variables 

— The prohle creation date. It is known for all pets 
on all three sites. 

— The birth date. The birth date is mandatory for 
all hamsters, and optional on Catster and Dog¬ 
ster. It is known for 76% of cats and 80% of dogs. 

— The weight. On Catster, the weight can be speci¬ 
fied as an exact number in pounds, and is known 
for 58% of cats. On Dogster, one out of hve ranges 
can be chosen (1-10 lbs, 11-25 lbs, 26-50 lbs, 51— 
100 lbs, 100+ lbs). The weight is not used on 
Hamsterster. 

— The number of friends of a pet. 

• The location (“home”) of a pet can be specihed on 
all three sites. We converted the location strings to 
latitude-longitude pairs using the Google Geocoding 
API [6]. The geolocation is known for 68% of cats, 
78% of dogs and 99% of hamsters. 

We additionally use as a feature the join age, defined as the 
age of the pet at the time of prohle creation. 

We define two measures of assortativity for multi-profile net¬ 
works: one that measures homophily on the prohle friend¬ 
ship level (r p ) and one that measures homophily on the ac¬ 
count level (r a ). For the friendship level, we consider the 


















friendship edges between pets in the networks. For the ac¬ 
count level, we consider all pairs of pets that are in the 
same household. As in most social networks, we expect to 
observe a certain amount of homophily in the pet friend¬ 
ship network. We further hypothesize that the homophily 
between pets within a single household is larger than the ho¬ 
mophily for pets connected by friendship links. Therefore, 
we compute measures of homophily for both levels, based on 
the available pet characteristics. 


assortativity ratio of a profile characteristic as 


?"rel — 


r a 




( 4 ) 


By construction r re 1 is larger than one if the assortativity 
is higher within profiles of one account than across friend¬ 
ships, and smaller than one if it is the assortativity across 
friendships that is higher. 


For categ orica l variables, we base the assortativity coeffi¬ 
cients on [l5] Eq. (2)]. Let C be the set of possible val¬ 
ues of the categorical variable, P x (i, j) the probability that 
a randomly chosen connected pair of profiles (either via a 
friendship edge for x = p, or in the same household for 
x = a) are in the categories i G (7 and j E C respectively, 
and P x (i) = JTP x (z,j). Then, we define the friendship 
assortativity coefficient r p and the household assortativity 
coefficient r a using 


r x 


E ^ x ( m )- E ,^) 2 

i-Ei-Px ^) 2 


(2) 


The assortativity coefficients defined in this way equal one 
for perfect positive homophily, and lie between negative one 
and zero for negative homophily^] 


For numerical variables, we use the Pearson correlation coef¬ 
ficient between the numerical properties of connected pets, 
as defined in [l5] Eq. (20)]. Let var x (X) be the variance 
of the numerical profile characteristic weighted by the num¬ 
ber of neighbors of the profile in the friendship graph, and 
cov x (X, Y) the covariance between the characteristics of pair¬ 
wise connected profiles, using again x = f for friendship con¬ 
nections and x = a for pairs of profiles of the same account. 
Then the assortativity coefficients r p and r a are given by 


cov x (x,y) 

var x (X) 


(3) 


Note that this expression is simplified from the usual Pear¬ 
son correlation coefficient because the relationships are sym¬ 
metric. The values of r x range from —1 to +1 and are one 
for perfect positive homophily and —1 for perfect negative 
homophily. 


For the geolocation, we use the distance correlation [20] 
as a measure of homophily, based on the great circle dis¬ 
tance between pairs of locations. Since locations are two- 
dimensional, the distance correlation is able to represent the 
orientation of the correlation as does the Pearson correla¬ 
tion, but cannot represent the direction of the correlation. 
Therefore the distance correlation ranges from zero to one, 
with one denoting perfect correlation and zero denoting no 
correlation. The location is always the same for pet pro¬ 
files created by a single user and therefore the family-level 
homophily for the location is always trivially one. 


All three types of assortativity measures are zero when nei¬ 
ther positive nor negative homophily is observed. To com¬ 
pare the both the assortativity coefficients on the friendship 
level and on the account level, we define the multi-profile 


d r x cannot be exactly —1; see 15 for an explanation. 


5.2 Discussion 

Table [3] shows the complete homophily analysis. For all fea- 
tures, the homophily within households is larger than the 
homophily between friends, and thus all multi-profile assor¬ 
tativity ratios are larger than one. This indicates, as we 
would expect from pets, that the underlying social network 
is primarily one of humans and not one of pets. However, 
the pet friendship network is not completely unassort at ive, 
as it displays positive assortativity (r > 0.5) by join date for 
all three sites. 

For the intra-household homophily, high values (r > 0.5) 
can be observed for the join date and the number of friends. 
Small positive assortativity (r > 0.1) can be observed for the 
race, the birth date, the join age, and the pet’s weight. The 
largest multi-profile assortativity ratio (r re i > 10 ) can be 
observed for the breed in Catster, the number of friends in 
Hamsterster, the join age in Catster and Hamsterster, and 
the pet weight in Catster. 

In terms of race, Dogster has a particularly high intra-house¬ 
hold homophily, indicating that owners of several dogs tend 
to prefer dogs of the same breed, while this is only true to a 
small extent for cats and hamsters. The sex and coloration of 
pets show no homophilic tendencies. The number of friends 
of a pet show negative assortativity on the friendship, and 
positive assortativity within households. This indicates that 
while the friendship ties display the usual degree dissorta- 
tivity of real social networks, the numbers of friends of pets 
within one household are similar, and therefore the degree 
of a pet is a function of the owner, not of the pet. The 
homophily with respect to he join date and birth date is 
higher in Hamsterster. This is consistent with the fact that 
hamsters have shorter lives. 

In conclusion, we find that the intra-household homophily 
is higher than the friendship homophily. Thus, with respect 
to profile features, these pet social networks largely follow 
the underlying human social networks. This conclusion is 
however only based on profile properties, and does not take 
into account the network structures. Therefore, we investi¬ 
gate the pet and human-level network structures in the next 
section. 


6. RELATIONSHIP BETWEEN 

FRIENDSHIPS AND FAMILY TIES 

So far, we have analysed the friendship and family ties on 
an individual level. We now perform several experiments to 
analyse the available networks as a whole, and to determine 
the relationships between the friendship network and family 
tie network at the structural level. In order to do so we 
extend the spectral diagonality test described in |To], which 
was originally used to analyse the temporal evolution of a 








Table 3: Homophily analysis comparing the strength of homophily across friendships r p and the strength of 
homophily within accounts r a . The multi-profile assortativity ratio is shown as r re \. 



Catster 

r p r a 

Trel 

Dogster 

rp r a 

T r el 

Hamsterster 

r p r a 

T r el 

Race a 

0.0138++ 

0.3137++ 

22.748 

0.1556++ 

0.3065++ 

1.970 

0.0973++ 

0.5349++ 

5.497 

Sex a 

0.0048++ 

0.0472++ 

9.848 

0.0075++ 

0.0154++ 

2.040 

0.0083+ 

0.1180++ 

14.264 

Coloration a 

0.0076++ 

0.0599 ++ 

7.864 

— 

— 

— 

0.0219++ 

0.1166++ 

5.325 

Weight range a,c 

— 

— 


0.1498++ 

0.2590++ 

1.729 

— 

— 


#Friends b 

-0.5232** 

0.7629** 

1.158 

-0.2893** 

0.6487** 

2.242 

—0.0310** 

0.6859** 

22.108 

Birth date b 

0.0406** 

0.2930** 

7.216 

0.0585** 

0.2114** 

3.613 

0.3542** 

0.5614** 

1.585 

Join date b 

0.4219** 

0.7327** 

1.737 

0.5584** 

0.7268** 

1.302 

0.5723** 

0.8266** 

1.444 

Join age b 

0.0187** 

0.2600** 

13.878 

0.0475** 

0.1738** 

3.663 

0.0317** 

0.3615** 

11.405 

Weight b,d 

0.0087** 

0.1827** 

20.991 

— 

— 

— 

— 

— 

— 

Location 6 

0.0888* 

— 

— 

0.1112** 

— 

— 

0.1863** 

— 

— 


++ and + denote an estimate on the error of less than 0.1% and 1%, respectively 
** and * denote a p-value of less than 0.001 and 0.01, respectively 
a Categorical variable; numbers denote the assortativity coefficient [15] Eq. (2)] 
b Numerical variable; numbers denote the Pearson correlation coefficient 15. Eq. 
c In Dogster, the weight can only be chosen from a predefined set of ranges 
d In Catster, the exact pet weight can be specified 

e Not computed for households as all pets in one household share their location 
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Eq. (5)] 
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network, to the comparison of the friendship network with 
the ownership structure in the multi-profile network. The 
result is a test that allows us to directly observe relationships 
between both structures, and a measure of the consistency 
between friendships and family ties. 


6.1 Definitions 


The graphs G p and G a can be represented by the adjacency 
matrices A p E {0,1}I + I X I V I and A a E {0,defined 
as follows: 


J 1 when { u , u} E V 
\ 0 when {u, v} ^ V 

( 1 when {z, j} E W 
y 0 when {z, j} ^ W 


( 5 ) 

( 6 ) 


Both matrices are symmetric. We also define a matrix giving 
the relationship between profiles and accounts. Let R E 
{0, i}Wl x l^l be the matrix defined by 


p> _ J 1 when m(u) — i 
Ul y 0 when m(u) ^ i ' ' 

R is rectangular, and by definition each row has a single en¬ 
try equaling one. By construction, the following relationship 
holds: 


A a = [R T A P R] (8) 

where the matrix operator [X] rounds all nonzero entries of 
X to one, and all diagonal entries to zero. We also define 
the family matrix F E {0, 1 }WI X I^I whose entries equal one 
when two profiles are managed by the same account and 
zero otherwise: 

_ J 1 when m(u) = m(v) 
uv | 0 when m(u) ^ m{v) ' ' 

The following relationship can then be established: 

F = RR t (10) 


Note that the diagonal elements of F are all one, since every 
profile is in the same account as itself. 


6.2 Methodology 

We seek to compare the friendship-level network and the 
family tie network using a spectral diagonality test, a tech¬ 
nique that was initially introduced to study time-evolving 
networks under the spectral evolution hypothesis, i.e., the 
hypothesis that under time evolution, the eigenvalues of a 
network’s adjacency matrix change while its eigenvector stay 
nearly constant [10 . Two matrices with the same eigenvec¬ 
tors are related by spectral transformations [IT] , and if they 
are adjacency matrices their relationship indicates how the 
one type of edge is related to the other type of edge. If 
Ai and A 2 are the adjacency matrix of a single network at 
two different timepoints and defined on the same node set, 
then the spectral diagonality test first computes the rank -k 
eigenvalue decomposition 

Ai = UAU t , (11) 

and then sets out the write an eigenvalue decomposition-like 
expression for A 2 , using the same eigenvector matrix U as 
for the first matrix: 

A 2 = UAU t (12 ) 

If both Ai and A 2 have the same set of eigenvectors, then 
the last equation is a proper rank-A: eigenvalue decomposi¬ 
tion of A 2 , and A gives its eigenvalues. Solving for A gives 

A = U t A 2 U. (13) 

If the k-by-k matrix A is diagonal, then the spectral evo¬ 
lution hypothesis is true, and if A is nearly diagonal, then 
the hypothesis is nearly true. Furthermore, comparing the 
diagonal entries of A and A gives an indication as to the 
actual algebraic function connecting the two matrices, such 
as matrix powers or exponentials 11. 

In the context of multi-profile networks, our goal is to learn 
the relationship between the friendship network and the fam¬ 
ily relationships. Thus, we apply the spectral diagonality 
test to the matrices A p and F. First, we compute the rank- 
k eigenvalue decomposition of the friendship adjacency ma- 
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Figure 4: The spectral diagonality test matrices A for the three sites, restricted to the topmost 50 x 50 
submatrix corresponding to largest eigenvalues of A. (a-c) The spectral diagonality test matrix A. (d- 
f) Comparison plot between the diagonal entries of A and A. 


trix: 

A p = UAU t (14) 

We then compute A: 

A = U t FU = U t RR t U 

Testing the k-by-k matrix A for diagonality then gives an 
indication whether both matrices are related, and the rela¬ 
tionship between the matrices A and A gives an indication 
of the path relationships between friend and family relations. 
We use the value k = 250 in all calculations. We addition¬ 
ally also define the coefficient of diagonality, which measures 
what proportion of the matrix F is explained by a spectral 
transformation of A p . We define the coefficient of diagonal¬ 
ity as the proportion of square entry weights in A that lie 
on the diagonal: 

V A 2 - 

5=^-£r (15) 

The coefficient ranges from zero to one, and attains one when 
the two matrices have the exact same eigenvectors. The de¬ 
nominator is the squared Frobenius norm of A, and since 
the Frobenius norm is invariant under orthogonal transfor¬ 
mations, it follows that 5 is the largest number such that F 
can be written as a sum of a spectral transformation of A p 
and another matrix. Thus, 5 denotes to what extent the fam¬ 
ily relationships are represented by friendships. Note that 
this coefficient works in an opposite way to well-known co- 
spectrality measures [9], which aim to measure how similar 
the eigenvalues of two matrices are, while S aims to measure 
to what extent they share the same eigenvectors. 

6.3 Experiments 

We compute the matrix A as described above for the three 
sites, and show the result in Figure [4] (a-c). Furthermore, 
Table [4] shows the diagonality coefficient 6 of the tests. The 


Table 4: The diagonality coefficient 6 for the three 
sites. _ 


Dataset 

(5 

Catster 

0.2754 

Dogster 

0.2013 

Hamsterster 

0.5512 


results show that all three datasets display a partial diago¬ 
nality for the matrix A. The diagonality coefficient 5 is 20% 
for Dogster, 28% for Catster, and 55% for Hamsterster. We 
may conclude from this that friendship links and family ties 
are the most consistent with each other on Hamsterster. All 
three results are consistent with temporal network evolution 
results given in TO]. 

Additionally, we show in Figure [4] (d-f) the relationship be¬ 
tween the diagonal elements of the matrix A (the eigenval¬ 
ues of A p ) and the diagonal elements of A. This type of 
plot serves to find out which matrix functions best maps 
one matrix to another [lT|. The three mappings seen in 
the plot allow us to draw two conclusions. First, the plots 
are nearly symmetrical around the Y axis, indicating that 
the best mapping matrix function is an even function, i.e., 
paths of even lengths of friendships should be used to pre¬ 
dict family ties. Secondly, for Catster and Hamsterster, the 
distribution of eigenvalues follows a nearly linear trend, in¬ 
dicating that a linear spectral graph transformation may be 
used, i.e., only short paths are relevant, and longer even 
paths (of length four, six, etc.) are not relevant. This is 
however not observed for Dogster. 

7. PREDICTING FAMILY TIES 

A family tie can be thought to exist between two pets that 
are in the same family, i.e., whose profiles were created by 




















the same user account. While on Catster, Dogster and Ham- 
sterster tie information is readily available under the “Meet 
My Family” header, the fact that two profiles were created 
by the same person cannot be easily verified on other on¬ 
line social networking platforms. Therefore, pet networks 
present an opportunity to study the prediction problem of 
detecting which profiles were created by the same account, 
since they provide complete ground truth data for an evalu¬ 
ation of the task. Thus, we analyse in this section the task 
of predicting that two pets are in the same family, given only 
friendship links and pet-level profile metadata. This allows 
us to determine how well it can be predicted whether two 
profiles are from the same account, even when that informa¬ 
tion is not public. Since we have multiple types of profile 
data available, we can investigate which profile data allows 
to do this how well. Also, the experiment serves to find out 
which properties of pets are consistent within a household, 
and which are independent of a household. 

7.1 Prediction Methods 

Given a multi-profile social network G = (V, W, E,m), we 
want to predict whether two profiles are managed by the 
same account, i.e., information contained in W and m, us¬ 
ing only the profile-level network G p = (V,E), including 
the metadata associated with it. In the case of pet social 
networks, we use the available pet profile information along 
with the pet-level friendship links for learning. We investi¬ 
gate the following indicators (i.e., features), each of which 
applies to a pair of profiles {u,v}: 

• Degree difference: The difference of degrees. 

• Friend: This feature is one if there is a friendship be¬ 
tween the two profiles and zero otherwise. 

• Common friends: The number of common friends be¬ 
tween the two profiles. 

• Jaccard index: The Jaccard index between the sets of 
friends of the two profiles [22]. This is related to the 
number of common friends, being normalized by the 
number of friends of either profile. 

• Same race, sex, coloration, location, join date and 
weight: These features are one if the corresponding 
profile information is equal, and zero otherwise. 

• Birth date, join date, join age and weight difference: 
The negative absolute difference between the corre¬ 
sponding values for the two profiles. We take the 
negative since we expect a small difference to be in¬ 
dicative of a same household, due to intra-household 
homophily. 

The exact definitions are given in Table [5] We do not use 
geographical distance between the two profiles, because we 
know that if the distance is larger than zero, then the profiles 
must be in distinct households. Thus, we only the the “same 
location” feature. Note also that the geolocation is given 
only up to the city level, i.e. all pets in New York City will 
be counted as having the same location, leading to a large 
number of pets from different households but with the exact 
same location. 


Table 5: Definitions of the features used for family 
tie prediction. Each feature is given as a function of 
an unordered profile pair {u, u}. 


Feature 

Definition 

Degree difference a 

Friend 

Common friends 

Jaccard index 

Same X 

Difference in X 

| log(l + d{u)) - log(l + d(v))\ 

( 1 when {u, u} G E 

\ 0 otherwise 

{re E V {u,w} : {v,w} E E}\ 

\{wev\{u,w}eEA{v,w}eE}\ 

\{wEV\{u,w}EEV {v ,w}EE}\ 
f 1 when X(u) = X(y) 

( 0 otherwise 
-\X(u)-X(v)\ 

a We use the logarithm because the distribution of degrees 
is better distributed on a logarithmic scale. The additive 


term of one is used to take into account degrees of zero. 

We also perform a logistic regression prediction, combining 
all features given above. Let fi(u,v) be the values for all 
features i enumerated above. Then, a logistic regression 
model takes the form 

/reg(w,w) = ^1 +exp |-a - . (16) 

The regression parameters hi as well as a are learned using 
a training set of profile pairs. The training profile pairs are 
sampled from each dataset such that it contains e pairs of 
profiles that are in the same household and e pairs of profiles 
that are not in the same profile. This training set is disjoint 
from the test set defined in a similar way below. 

7.2 Experimental Setup 

In order to measure the accuracy of each prediction method, 
we use a test set defined in the same manner as the training 
set, i.e., we randomly sample e pet pairs known to be in the 
same family, and e pet pairs known not to be in the same 
family. This test set is disjoint from the training set used 
for learning the regression parameters. The accuracy of the 
prediction methods is measured using the area under the 
curve (AUC) [5], which measures the probability that our 
prediction gives the correct ordering when applied to two 
randomly chosen pairs of profiles. Thus, the AUC is 1/2 
for a random prediction, and one for a perfectly accurate 
prediction. It is less than 1/2 for inverted predictions, i.e. 
predictions methods that become better when their values 
are negated. A perfectly inaccurate prediction has an AUC 
of zero. Table [5] gives the AUC values for each method 
separately and for the regression predictions, as well as the 
learned regression weights for each of the three sites. 

7.3 Discussion 

We observe that in all three sites, pets in the same household 
can be detected with an AUC of over 99% using the regres¬ 
sion predictor. This means that given two pairs of pets, one 
of which from the same household and one of which from 
two different households, our algorithm will detect which 
is which in over 99% of cases. This high value can be ex¬ 
plained by the fact that certain individual indicators are 
already highly indicative of family ties. 







Table 6: Results of family tie prediction. 


Feature 

Cat 

AUC 

Dog 

Ham. 

Regression weights 
Cat Dog Ham. 

Degree difference 

82.3% 

75.7% 

72.3% 

0.09 

-0.27 

0.22 

Friend a 

50.3% 

50.6% 

— 

4.83 

3.76 

— 

Common friends 

79.0% 

91.5% 

71.7% 

-0.46 

0.71 

4.98 

Jaccard index 

82.8% 

92.2% 

76.2% 

5.78 

9.73 

1.25 

Same race 

66.4% 

66.2% 

76.4% 

1.32 

3.08 

0.92 

Same sex 

51.9% 

50.3% 

54.2% 

0.07 

0.02 

-0.09 

Same coloration 13 

57.2% 

— 

59.4% 

0.95 

— 

5.59 

Same location 

87.2% 

90.3% 

99.6% 

11.02 

8.92 

21.21 

Birth date difference 

53.7% 

50.1% 

73.5% 

-0.41 

-0.30 

0.42 

Same join date 

79.7% 

74.6% 

78.2% 

6.08 

5.44 

6.21 

Join date difference 

90.8% 

87.6% 

91.9% 

1.19 

0.87 

-0.24 

Join age difference 

52.7% 

48.7% 

66.2% 

0.42 

0.30 

-0.88 

Weight difference 0 

41.6% 

— 

— 

-0.01 

— 

— 

Same weight 0 

— 

61.9% 

— 

— 

0.52 

— 

Regression 

| 99.3% 

99.6% 

99.9% 





a Hamsterster does not allow friendship links within one household. 
b Dogster does not allow to specify a dog’s coloration. 
c Catster allows exact weights and Dogster has weight ranges. 


The best individual predictor, the join date difference, achieves 
an AUC near to 90% for all three sites, indicating that users 
often create multiple pet accounts in quick succession. This 
may be explained by the fact that the sites have only been in 
operation for a decade. After a longer time period of obser¬ 
vation, we may expect this number to go down. In contrast 
to this, the birth date of a pet is not a good indicator for 
being in the same household (AUC near to 50% for Catster 
and Dogster), indicating that users of the pet social net¬ 
works do not have pets all born in quick succession; this is 
consistent with the behavior of many people acquiring new 
pets only after old ones die. 

The location is a good individual indicator too, as by con¬ 
struction pets of the same household must have the same 
location. 

Properties of pets such as the sex, the race, the coloration 
and the weight are not good indicators, with most AUC val¬ 
ues not differing much from 1/2. The highest AUC values 
among these is achieved by the species of hamsters (76%), 
the breed of cats and dogs (66%) and the weight ranges on 
Dogster (62%). This indicates that there is a slight ten¬ 
dency for owners to own pets of the same breed, and dogs 
of comparable weight. The failure of cat weight’s to predict 
anything can be explained by the low variance in cat weights 
in general, as compared to the high variance of dog weights. 

The indicators based on the friendship network achieve AUC 
values from 70% to 90%, also indicating good prediction per¬ 
formance. The only exception is the existance of a friendship 
link itself, whose AUC is very near to 1/2. We may interpret 
this as users not being sure what to make of the possibility 
to connect two of their own pets with a friendship link; some 
users do it and some do not. This result is consistent with 
the symmetric shape of the plots in Figure [4] (d-f), which 
indicate that paths of even length of friendship links should 
be used. 


8. SUMMARY AND CONCLUSIONS 

We have analysed the three online pet social networks Cat¬ 
ster, Dogster and Hamsterster under the aspect of them be¬ 
ing multi-profile networks, as they allow individual users to 
create any number of profiles, for each of their pets. We 
have shown that multi-profile networks can be analysed on 
two levels: the profile level and the account level. Our ex¬ 
periments showed that the two networks are related, but not 
identical, as the profile-level network is smaller, has smaller 
degrees, has a more equal degree distribution, less cluster¬ 
ing and lower average path lengths. We also showed that 
a multi-profile network implicitly contains household links, 
and therefore a comparison between friendship and house¬ 
hold links can be performed. We confirmed through a ho- 
mophily analysis that intra-household homophily is higher 
than across-friendship homophily, and defined the multi¬ 
profile assortativity ratio in order to measure that difference. 
In experiments, we found that the pet breed, join age and 
weight display the highest differences. Through extended 
spectral tests of diagonality, we were able to discover the 
relationship between friendships links and family ties in the 
network. Finally, we showed that it is possible to predict 
whether two profiles were created by the same user with a 
very high precision. In regards to this high precision, we 
conclude that it should be possible in principle to analyse 
the behavior of users creating multiple accounts on social 
networking platforms where this is not allowed. While cor¬ 
responding datasets are inherently difficult to come by, a 
corresponding analysis would shed light on user behavior 
in terms of whether the profiles they create can be consid¬ 
ered individual actors in the social network, or whether the 
person-level network should rather be considered. Although 
the methods developed in this paper can be applied to such 
datasets, we do not expect the individual numerical results 
to hold for the individual features, as users knowing that 
the creation of multiple accounts is not allowed can be ex¬ 
pected to behave in a largely different way than users who 
are allowed to do this. 
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